As someone born and brought up in a country where you get tropical weather, when I moved to vancouver one of the things that intrigued me was the beautiful spring bloom and fall foliage. When I googled the images for spring bloom and fall colors, it showed some beautiful pictures of vancouver streets with some amazing colors.
So, when I got the dataset, I thought it would be interesting to investigate and find out the locations where I could go and experience this by doing the analysis.
Fig 1. Spring Bloom
Fig 2. Fall Foliage
Since we are trying to find the street and block where the colors are as seen in the figures above. The following questions are of interest for this project work
The libraries needed for this analysis are imported as shown,
import pandas as pd
import altair as alt
import json
alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')
The data set is read and stored as follows. Since, we are looking to find trees along the streets of Vancouver, we would filter the dataset for the trees located near the curbs, we would also filter the dataset for the columns that are of interest to us.
tree_data_all = pd.read_csv('vancouver_trees.csv')
tree_data_all = tree_data_all[tree_data_all['curb'] == 'Y']
tree_data_full = tree_data_all[['on_street','neighbourhood_name','common_name','genus_name','on_street_block','latitude','longitude']]
tree_data_full.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 27454 entries, 0 to 29999 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 on_street 27454 non-null object 1 neighbourhood_name 27454 non-null object 2 common_name 27454 non-null object 3 genus_name 27454 non-null object 4 on_street_block 27454 non-null int64 5 latitude 27454 non-null float64 6 longitude 27454 non-null float64 dtypes: float64(2), int64(1), object(4) memory usage: 1.7+ MB
The columns which we are interested in, does not contain any null values.
The dataset gives Genus name, species name and common name for each trees in Vancouver. For our analysis the Vancouver tree data set is read and filtered for fall and spring genuses. Doing the analysis either on the basis of species name or common name is out of the scope this work. The fall and spring genuses are categorized into 2 lists manually for fall and spring genuses, and then the filtering for the dataframe is done.List of genuses for spring are stored in a list named list_flowering. Similarly, list of genuses for fall are stored in a list named list_decidous
list_flowering = ['AMELANCHIER','CASTANEA','CATALPA','CERCIS','CHITALPA','CLADRASTIS','CORNUS','CRATAEGUS',
'DAVIDIA','KOELREUTERIA','LABURNUM','MAGNOLIA','MALUS','MANGLIETIA','MESPILUS','PAULOWNIA','PRUNUS',
'PYRUS','ROBINIA','SALIX','SOPHORA','STEWARTIA','STYRAX','SYRINGA']
list_decidous = ['ACER','AESCULUS','BETULA', 'CARPINUS', 'CASTANEA','CATALPA','CELTIS', 'CERCIDIPHYLLUM','CERCIS','CLADRASTIS','CORNUS','CORYLUS','CRATAEGUS',
'DAVIDIA','EUCOMMIA','EUONYMUS','FAGUS','FRAXINUS','GINKGO','GLEDITSIA','GYMNOCLADUS','JUGLANS','KOELREUTERIA','LARIX','LIQUIDAMBAR',
'LIRIODENDRON','MAGNOLIA', 'MALUS','MANGLIETIA', 'MESPILUS', 'METASEQUOIA', 'NOTHOFAGUS', 'NYSSA','OSTRYIA','OXYDENDRUM','PARROTIA',
'PLATANUS', 'POPULUS', 'PRUNUS', 'PTELEA', 'PTEROCARYA','PYRUS','QUERCUS', 'RHAMNUS', 'RHUS','ROBINIA','SALIX','SOPHORA', 'SORBUS',
'STEWARTIA', 'STYRAX', 'SYRINGA','TILIA','ULMUS','ZELKOVA']
Next step is filtering the dataframe to get dataframes for fall and spring
tree_data_fall = tree_data_full[tree_data_full['genus_name'].isin(list_decidous)].reset_index(drop=True)
tree_data_fall.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | MAPLE ST | Kitsilano | SYCAMORE MAPLE | ACER | 2900 | 49.259856 | -123.150586 |
| 1 | WALES ST | Renfrew-Collingwood | PRINCETON GOLD MAPLE | ACER | 5200 | 49.236650 | -123.051831 |
| 2 | W BROADWAY | Kitsilano | KARPICK RED MAPLE | ACER | 3600 | 49.264250 | -123.184020 |
| 3 | PENTICTON ST | Renfrew-Collingwood | CHANTICLEER PEAR | PYRUS | 2500 | 49.261036 | -123.052921 |
| 4 | RHODES ST | Renfrew-Collingwood | DAWN REDWOOD | METASEQUOIA | 5600 | 49.233354 | -123.050249 |
tree_data_spring = tree_data_full[tree_data_full['genus_name'].isin(list_flowering)].reset_index(drop=True)
tree_data_spring.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | PENTICTON ST | Renfrew-Collingwood | CHANTICLEER PEAR | PYRUS | 2500 | 49.261036 | -123.052921 |
| 1 | E 53RD AV | Sunset | KWANZAN FLOWERING CHERRY | PRUNUS | 700 | 49.221900 | -123.087772 |
| 2 | FREMLIN ST | Oakridge | JAPANESE FLOWERING CRABAPPLE | MALUS | 6300 | 49.227886 | -123.126944 |
| 3 | W 16TH AV | Shaughnessy | PISSARD PLUM | PRUNUS | 1700 | 49.257081 | -123.144401 |
| 4 | E 5TH AV | Hastings-Sunrise | GOLDENRAIN TREE | KOELREUTERIA | 2800 | 49.265769 | -123.045915 |
By visualizing the data in a map would be the best way to find out which all neighbourhoods have the best fall and spring colors. The geojson for Vancouver is available through a url. This was obtained from the Vancouver Data Portal. To make a base map of Vancouver we use the geojson url saved in url_geojson.
url_geojson = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))
data_geojson_remote
Data({
format: DataFormat({
property: 'features',
type: 'json'
}),
url: 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
})
vancouver_map = alt.Chart(data_geojson_remote).mark_geoshape(
color = 'white', opacity= 0.5, stroke='black').encode(
).project(type='identity', reflectY=True)
To obtain the map the dataset is grouped based on the neighbourhood data and the genus and coordinate values are aggregated as follows. First we would do it for the Spring dataset and do the same for the fall dataset as well.
tree_data_spring_neigh = tree_data_spring.groupby('neighbourhood_name').agg(
{'genus_name':'count','latitude':'median','longitude':'median'}).reset_index()
tree_data_spring_neigh = tree_data_spring_neigh.assign(genus_count = tree_data_spring_neigh['genus_name'])
tree_data_spring_neigh = tree_data_spring_neigh.drop(columns='genus_name')
tree_data_spring_neigh.head()
| neighbourhood_name | latitude | longitude | genus_count | |
|---|---|---|---|---|
| 0 | Arbutus-Ridge | 49.250025 | -123.161904 | 422 |
| 1 | Downtown | 49.277961 | -123.122331 | 78 |
| 2 | Dunbar-Southlands | 49.243260 | -123.186695 | 564 |
| 3 | Fairview | 49.264512 | -123.131607 | 208 |
| 4 | Grandview-Woodland | 49.273117 | -123.063963 | 387 |
A selection for neighbourhood name is added as follows, which allows us to hover over the map to highlight the individual neighbourhood which we are interested. The map is obtained for the spring as follows..
title_spring_map = alt.TitleParams('Spring Genus Distribution Map',
subtitle = ['Hover over neighbourhoods for highlighting the corresponding data'])
select_neighbourhood = alt.selection_single(fields=['neighbourhood_name'],on='mouseover',clear='mouseout', bind='legend')
neighbourhood_tree_plot_spring = alt.Chart(data_geojson_remote, title =title_spring_map).mark_geoshape(
stroke = 'black', strokeWidth = 0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme ='redpurple'),title = None,legend=None),
opacity=alt.condition(select_neighbourhood,alt.value(1), alt.value(0.1)),
tooltip =[alt.Tooltip('neighbourhood_name:N', title = 'Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title = 'No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_spring_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity', reflectY =True).add_selection(select_neighbourhood)
vancouver_spring_map = vancouver_map + neighbourhood_tree_plot_spring
To obtain the individual genus counts for each neighbourhood, a scatter plot is obtained from the original spring data set and is combined with the vancouver spring map.
title_scatter_spring = alt.TitleParams('Scatter Plot Showing the Spring Genus Distribution',
subtitle = 'Click Neighbourhood Name Legend and hover over the points on the plot for highlighting the corresponding data')
genus_count_scatter_spring = alt.Chart(tree_data_spring, title= title_scatter_spring).mark_circle(size=70,stroke='black').encode(
alt.X('genus_name',title = 'Genus Name',sort='-y'),
alt.Y('count()',title= ' Genus Count'),
alt.Color('neighbourhood_name',title = 'Neighbourhood Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0)),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('count(genus_name)',title ='Genus Count')]).add_selection(select_neighbourhood).properties(width=900,height =300)
vancouver_spring_map & genus_count_scatter_spring
The above process is repeated for the fall dataset as well, the grouped fall dataset is as shown below
tree_data_fall_neigh = tree_data_fall.groupby('neighbourhood_name').agg(
{'genus_name':'count','latitude':'median','longitude':'median'}).reset_index()
tree_data_fall_neigh = tree_data_fall_neigh.assign(genus_count = tree_data_fall_neigh['genus_name'])
tree_data_fall_neigh = tree_data_fall_neigh.drop(columns = 'genus_name')
tree_data_fall_neigh.head()
| neighbourhood_name | latitude | longitude | genus_count | |
|---|---|---|---|---|
| 0 | Arbutus-Ridge | 49.248710 | -123.161757 | 903 |
| 1 | Downtown | 49.279966 | -123.119568 | 892 |
| 2 | Dunbar-Southlands | 49.245445 | -123.184780 | 1526 |
| 3 | Fairview | 49.263478 | -123.130377 | 719 |
| 4 | Grandview-Woodland | 49.272675 | -123.064132 | 1179 |
The vancouver fall map is as given below
title_fall = alt.TitleParams('Fall Genus Distribution Map',
subtitle = ['Hover over neighbourhoods for highlighting the corresponding data'])
select_neighbourhood = alt.selection_single(fields=['neighbourhood_name'],on='mouseover',clear='mouseout', bind='legend')
neighbourhood_tree_plot_fall = alt.Chart(data_geojson_remote,title =title_fall).mark_geoshape(
stroke = 'black', strokeWidth = 0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme = 'yelloworangered'),title = None,legend=None),
opacity=alt.condition(select_neighbourhood,alt.value(1), alt.value(0.1)),
tooltip =[alt.Tooltip('neighbourhood_name:N', title = 'Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title = 'No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_fall_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity', reflectY =True).add_selection(select_neighbourhood)
vancouver_fall_map = vancouver_map + neighbourhood_tree_plot_fall
Next is the genus scatter for the fall
title_scatter_spring = alt.TitleParams('Scatter Plot Showing the Spring Genus Distribution',
subtitle = 'Click Neighbourhood Name Legend and hover over the points on the plot for highlighting the corresponding data')
genus_count_scatter_fall = alt.Chart(tree_data_fall,title = title_scatter_spring).mark_circle(size=70,stroke='black').encode(
alt.X('genus_name',title = 'Genus Name',sort='-y'),
alt.Y('count()',title= ' Genus Count'),
alt.Color('neighbourhood_name',title = 'Neighbourhood Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0)),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('count(genus_name)',title ='Genus Count')]).add_selection(select_neighbourhood).properties(width=900,height =400)
vancouver_fall_map & genus_count_scatter_fall
genus_fall_plot = alt.Chart(tree_data_fall_neigh, title = 'Top 5 Neigbourhood with Most Fall Genuses').mark_bar(color ='orangered').encode(
alt.X('genus_count', title = None, axis= None),
alt.Y('neighbourhood_name',sort ='-x',title = None)).transform_window(
rank = 'rank(genus_count)',
sort = [alt.SortField('genus_count',order ='descending')]).transform_filter(
alt.datum.rank <= 5)
genus_fall_plot = genus_fall_plot + genus_fall_plot.mark_text(align ='left', dx=3).encode(text ='genus_count:Q', color = alt.value('black'))
genus_spring_plot = alt.Chart(tree_data_spring_neigh,title = 'Top 5 Neigbourhood with Most Spring Genuses').mark_bar(color = 'pink').encode(
alt.X('genus_count',title= None,axis=None),
alt.Y('neighbourhood_name',sort ='-x', title=None)).transform_window(
rank = 'rank(genus_count)',
sort = [alt.SortField('genus_count',order ='descending')]).transform_filter(
alt.datum.rank <= 5)
genus_spring_plot = genus_spring_plot + genus_spring_plot.mark_text(align ='left', dx=3).encode(text ='genus_count:Q', color = alt.value('black'))
genus_season_plot = genus_fall_plot | genus_spring_plot
genus_season_plot
For the given dataset, from the above plots its clear that, From the plots it can be seen the top 5 neighbourhoods with most fall genuses on the streets are 'Kensington-Cedar Cottage', 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Victoria-Fraserview', 'Dunbar-Southlands', 'Sunset', and similarly for spring the neighbourhoods are 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Kensington-Cedar Cottage', 'Victoria-Fraserview', 'Sunset', 'Dunbar-Southlands'. If someone is visiting vancouver, I would recommend to visit these neighbourhood get the most of the vancouver colors!
The most common genuses for the fall and spring are obtained from the following bar plots
title_genus_fall_plot = alt.TitleParams(text ='Most Common Tree Genuses for Fall')
genus_fall_plot = (alt.Chart(tree_data_fall,title = title_genus_fall_plot).transform_aggregate(
groupby =['genus_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
y=alt.Y('genus_name:N',sort = '-x',title = None),
x=alt.X('gen_count:Q',axis = None),
color = alt.Color(value='orangered'),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]).transform_filter(
alt.datum.rank <= 10))
common_fall_plot = genus_fall_plot + genus_fall_plot.mark_text(align ='left', dx=3).encode(text ='gen_count:Q', color = alt.value('black'))
title_genus_spring_plot = alt.TitleParams(text ='Most Common Tree Genuses for Spring')
genus_spring_plot = (alt.Chart(tree_data_spring,title = title_genus_spring_plot).transform_aggregate(
groupby =['genus_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
y=alt.Y('genus_name:N',sort='-x',title= None),
x=alt.X('gen_count:Q', title = 'Genus Count',axis = None),
color = alt.Color(value='pink'),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]).transform_filter(
alt.datum.rank <= 10))
common_spring_plot = genus_spring_plot + genus_spring_plot.mark_text(align ='left', dx=3).encode(text ='gen_count:Q', color = alt.value('black'))
common_genus_plot = common_fall_plot | common_spring_plot
common_genus_plot
For further analysis the most common genuses for fall and spring will be looked upon to find the streets with most colours.
For fall these genuses would be ACER, PRUNUS, FRAXINUS, TILIA, CARPINUS, QUERCUS, FAGUS, MALUS, MAGNOLIA, CRATAEGUS.
For Spring these genuses would be PRUNUS, MALUS, MAGNOLIA, CRATAEGUS, PYRUS.
common_genus_df = tree_data_spring.groupby('genus_name').agg(
{'common_name':'count'}).reset_index().sort_values('common_name',ascending=False).reset_index(drop=True).loc[0:4]
common_genus_spring = list(common_genus_df['genus_name'])
common_genus_df = tree_data_fall.groupby('genus_name').agg(
{'common_name':'count'}).reset_index().sort_values('common_name',ascending=False).reset_index(drop=True).loc[0:9]
common_genus_fall = list(common_genus_df['genus_name'])
For doing further analysis, it is needed to filter the spring and fall dataframes based on the common genuses for spring and fall found in the previous session.
tree_common_spring = tree_data_spring[tree_data_spring['genus_name'].isin(common_genus_spring)].reset_index(drop=True)
Also, it would be a good idea to observe the common genus distribution in each neighbourhood for fall and spring.
The common genus distribution for spring is plotted.
title_common_spring = alt.TitleParams('The Common Genus Distribution',
subtitle = 'Hover over points for selection')
common_genus_scatter_spring = alt.Chart(tree_common_spring, title = title_common_spring).mark_circle(size=100,stroke='black').encode(
alt.X('neighbourhood_name',title = 'Neighbourhood Name'),
alt.Y('count()', title = 'Genus Count'),
alt.Color('genus_name',title='Genus Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0))).add_selection(select_neighbourhood)
common_genus_scatter_spring
The common genus distribution for fall is as follows..
tree_common_fall = tree_data_fall[tree_data_fall['genus_name'].isin(common_genus_fall)].reset_index(drop=True)
title_common_fall = alt.TitleParams('The Common Genus Distribution',
subtitle = 'Hover over points for selection')
common_genus_scatter_fall = alt.Chart(tree_common_fall, title = title_common_fall).mark_circle(size=100,stroke='black').encode(
alt.X('neighbourhood_name',title = 'Neighbourhood Name'),
alt.Y('count()', title ='Genus Count'),
alt.Color('genus_name',title='Genus Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0))).add_selection(select_neighbourhood)
common_genus_scatter_fall
By grouping the data by neighbourhood_name, genus_name and on_street columns we get the streets in which we have the most color.
Also, we would limit our analysis to streets with more than 25 counts, to make it easier for us to plot for analysis.
genus_street_s = tree_common_spring.groupby(['neighbourhood_name','genus_name','on_street']).size()
genus_street_spring = genus_street_s.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_spring_fil = genus_street_spring[genus_street_spring['genus_count']>=25]
genus_street_spring_fil.head()
| neighbourhood_name | genus_name | on_street | genus_count | |
|---|---|---|---|---|
| 0 | Marpole | PRUNUS | W 59TH AV | 41 |
| 1 | Arbutus-Ridge | PRUNUS | W 22ND AV | 38 |
| 2 | Victoria-Fraserview | PRUNUS | DUMFRIES ST | 38 |
| 3 | Renfrew-Collingwood | PRUNUS | RUPERT ST | 35 |
| 4 | Kensington-Cedar Cottage | PRUNUS | DUMFRIES ST | 34 |
Similar grouping and counting is done for fall
genus_street_f = tree_common_fall.groupby(['neighbourhood_name','genus_name','on_street']).size()
genus_street_fall = genus_street_f.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_fall_fil = genus_street_fall[genus_street_fall['genus_count']>=30]
genus_street_fall_fil.head()
| neighbourhood_name | genus_name | on_street | genus_count | |
|---|---|---|---|---|
| 0 | Kitsilano | ACER | W 6TH AV | 59 |
| 1 | Kitsilano | ACER | W 11TH AV | 54 |
| 2 | Kitsilano | ACER | W 15TH AV | 51 |
| 3 | Kensington-Cedar Cottage | ACER | KINGSWAY | 46 |
| 4 | Shaughnessy | ACER | ANGUS DRIVE | 45 |
According to the given data, going to these streets would give us the most fall and spring colors. The top 5 streets for the fall colors are W 6TH AV, W 11TH AV, W 15TH AV, KINGSWAY and ANGUS DRIVE. And similarly from the given dataset the top 5 streets where spring bloom is observed are W 59TH AV, DUMFRIES ST, W 22ND AV, RUPERT ST and DUMFRIES ST.
genus_street_fall_plot = alt.Chart(genus_street_fall_fil, title= 'Fall genuses on the streets of Vancouver').mark_circle(size =200).encode(
alt.X('on_street',title =None),
alt.Y('genus_name', title = None),
alt.Color('genus_count', scale = alt.Scale(scheme = 'yelloworangered'), legend=None),
alt.Tooltip('genus_count'))
genus_street_spring_plot = alt.Chart(genus_street_spring_fil, title= 'Spring genuses on the streets of Vancouver').mark_circle(size =200).encode(
alt.X('on_street',title =None),
alt.Y('genus_name', title = None),
alt.Color('genus_count',scale = alt.Scale(scheme ='redpurple'), legend=None),
alt.Tooltip('genus_count'))
list_neigh = sorted(list(tree_data_full['neighbourhood_name'].unique()))
dropdown_neigh = alt.binding_select(name = 'Neighbourhood',options = list_neigh)
select_neigh = alt.selection_single(fields = ['neighbourhood_name'],
bind =dropdown_neigh)
genus_street_plot = (genus_street_spring_plot.encode(opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0)))
& genus_street_fall_plot.encode(opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0)))
).resolve_scale(color='independent')
genus_street_plot.add_selection(select_neigh)
From the above plot the most number of genuses per street distribution is observed. Tre dropdown menu is giving us the provision to check if the street belongs to the neighbourhood we are interested in. A tooltip option is provided to help us keep track of the genus count.
The on street analysis alone doesn't give the full picture, since they may not be even on the same block. So, grouping the data to include the block information would help us to get a better idea of the exact location where we would be able to find the trees.
genus_street_sb = tree_data_spring.groupby(['neighbourhood_name','genus_name','on_street','on_street_block']).size()
genus_street_block_spring = genus_street_sb.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_block_spring_fil = genus_street_block_spring[genus_street_block_spring['genus_count']>8]
genus_street_block_spring_fil.head()
| neighbourhood_name | genus_name | on_street | on_street_block | genus_count | |
|---|---|---|---|---|---|
| 0 | Killarney | PRUNUS | BUTLER ST | 7700 | 17 |
| 1 | Kensington-Cedar Cottage | PRUNUS | E 20TH AV | 1400 | 13 |
| 2 | Killarney | MAGNOLIA | SPARBROOK CRESCENT | 7700 | 13 |
| 3 | Victoria-Fraserview | PRUNUS | HARRISON DRIVE | 2300 | 12 |
| 4 | West Point Grey | PRUNUS | W 10TH AV | 4400 | 12 |
genus_street_fb = tree_data_fall.groupby(['neighbourhood_name','on_street','genus_name','on_street_block']).size()
genus_street_block_fall = genus_street_fb.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_block_fall_fil = genus_street_block_fall[genus_street_block_fall['genus_count']>8]
genus_street_block_fall_fil.head()
| neighbourhood_name | on_street | genus_name | on_street_block | genus_count | |
|---|---|---|---|---|---|
| 0 | Killarney | BUTLER ST | PRUNUS | 7700 | 17 |
| 1 | Mount Pleasant | ATHLETES WAY | ACER | 100 | 15 |
| 2 | South Cambie | W 22ND AV | ACER | 900 | 13 |
| 3 | Kensington-Cedar Cottage | E 20TH AV | PRUNUS | 1400 | 13 |
| 4 | Killarney | SPARBROOK CRESCENT | MAGNOLIA | 7700 | 13 |
From the above data it can be seen that the top 5 spots to see the fall colors would be in 7700th butler st, 100th athletes way, 1400th E 20th Av, 7700th Sparbrook crescent and 3500 W 30th Av. Similarly, the spring bloom can be observed on the 7700th butler st, 1400th E 20TH Av, 7700th SPARBROOK CRESCENT, 2300th HARRISON DRIVE and 4400 W 10TH Av.
Now let's plot the exact location where we could see the group of trees to get the best view. The filtered dataframe for both spring and fall are plotted as shown.Eventhough the street block is given as an int value, we need to specify it as ordinal data, as it is the street block number, not a continuous value.
slider_count_fall = alt.binding_range(name = 'Fall Genus Count',
step = 1,
min = min(genus_street_block_fall_fil['genus_count']),
max = max(genus_street_block_fall_fil['genus_count']))
radio_genus_fall = alt.binding_radio(name = 'Common Fall Genus', options = common_genus_fall )
radio_slider_select_fall = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count_fall,'genus_name':radio_genus_fall})
title_location_fall = alt.TitleParams('The Location Plot for Fall',
subtitle = 'Hover over points for getting the location details')
street_block_plot_fall = alt.Chart(genus_street_block_fall_fil, title = title_location_fall).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'yelloworangered'),legend=None),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('on_street_block',title='Block Number'),
alt.Tooltip('on_street', title ='Street Name')],
opacity = alt.condition(radio_slider_select_fall, alt.value(0.9), alt.value(0.1))).properties(height=350,width=800).add_selection(radio_slider_select_fall)
street_block_plot_fall
slider_count_spring = alt.binding_range(name = 'Spring Genus Count',
step = 1,
min = min(genus_street_block_spring_fil['genus_count']),
max = max(genus_street_block_spring_fil['genus_count']))
radio_genus_spring = alt.binding_radio(name = 'Common Spring Genus', options = common_genus_spring )
radio_slider_select_spring = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count_spring, 'genus_name':radio_genus_spring})
title_location_spring = alt.TitleParams('The Location Plot for Spring',
subtitle = 'Hover over points for getting the location details')
street_block_plot_spring = (alt.Chart(genus_street_block_spring_fil,title = title_location_spring).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None,axis=alt.Axis(grid = False)),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'redpurple'), legend=None),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('on_street_block',title='Block Number'),
alt.Tooltip('on_street', title ='Street Name')],
opacity = alt.condition(radio_slider_select_spring, alt.value(0.9), alt.value(0.1)
)).properties(height=350,width=500).add_selection(radio_slider_select_spring))
street_block_plot_spring
For both plots a tooltip is provided to get the exact location as it gives the information about neighbourhood name, street name and block number. Also, the radio buttons can be used to filter the plot for the desired genus. The genus count slider is useful to filter the plots for the number of genuses.
This project work shows where in Vancouver someone could observe the best fall and spring colors.
The fall genus distribution map and the spring genus distribution map shows us the neighbourhoods where the most amount of tree genuses that causes the Fall Foliage and Spring Bloom are located. Since, the given dataset is a randomly sampled subset of the full dataset, the found observations may not be giving the actual reality. But, according to the data, the top 3 neighbourhoods to visit to experience the most of Fall Foliage in autumn would be Kensington-Cedar Cottage', 'Renfrew-Collingwood', 'Hastings-Sunrise'. Similarly, to observe Spring Bloom one should visit 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Kensington-Cedar Cottage' neighbourhoods.
Now the distribution of the trees may not be uniform on the streets of these neighbourhoods, hence we looked at the streets with most foliage and blooms for fall and spring respectively. It was found that the streets with most genuses for fall and spring were not necessarily in the same top neighbourhoods as obtained earlier. There may be 2 reasons for the above observation :
The ultimate goal of this work is to find out the exact street location where we can observe uniform colors as given in Fig 1 and Fig 2. By doing the analysis on the given data set, we found out the neighbourhood, street and block where we can find the genuses of the same kind to get that beautiful view. From the analysis, 7700th butler st, 100th athletes way, 1400th E 20th Ave takes the podium to observe the fall colors and 7700th butler st, 1400th E 20TH Av, 7700th SPARBROOK CRESCENT to observe the spring bloom.
The scope of this work would be to include the full dataset and do the analysis again to find adjacent blocks where the bloom continues on. With the given dataset, to name one for each season, I have found 2 adjacent blocks in the Renfrew-Collingwood neighbourhood at 3500 Tanner St and 3600 Tanner St to observe spring bloom and in Dunbar-Southlands at 3400 W33RD AV and 3500 W33RD AV for fall colors. From living in Vancouver for the past 3 year, I know that there are more streets like this and no of adjacent blocks is more than 2. Another scope would be to observe the height range and diameter of trees in each block to observe if they have similar dimensions. It would be interesting to do the same analysis by grouping the data set for the common name for the trees.
The following dashboard gives the spring map along with the genus distribution for all genuses and the genus distribution for top genuses in spring.
(genus_count_scatter_spring & (vancouver_spring_map | common_genus_scatter_spring)).add_selection(select_neighbourhood)
Similar to the spring dashboard this dashboard gives the fall map along with the genus distribution for all genuses and the genus distribution for top genuses in fall.
genus_count_scatter_fall & (vancouver_fall_map | common_genus_scatter_fall)
The following dashboard gives the combined plot for spring and fall to find the exact location of the genuses in each neighbourhood.
title_location = alt.TitleParams('The Combined Location Plot for Fall and Spring',
subtitle = ['Hover over points for getting the location details',
'Use widgets for selecting Genus name and filtering the Genus count'],
anchor = 'middle')
street_block_location_fall = alt.Chart(genus_street_block_fall_fil).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'yelloworangered'),legend=None),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('on_street_block',title='Block Number'),
alt.Tooltip('on_street', title ='Street Name')]
).properties(title = 'Fall Plot',height=350,width=800)
street_block_location_spring = alt.Chart(genus_street_block_spring_fil).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None,axis=alt.Axis(grid = False)),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'redpurple'), legend=None),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('on_street_block',title='Block Number'),
alt.Tooltip('on_street', title ='Street Name')]).properties(title='Spring Plot',height=350,width=800)
slider_count = alt.binding_range(name = 'Fall Genus Count',
step = 1,
min = min(genus_street_block_fall_fil['genus_count']),
max = max(genus_street_block_fall_fil['genus_count']))
common_genus_fall_spring = list(set(common_genus_fall + common_genus_spring))
radio_genus = alt.binding_radio(name = 'Common Genus', options = common_genus_fall_spring)
radio_slider_select = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count,'genus_name':radio_genus})
street_block_plot_combined = (street_block_location_fall.encode(opacity = alt.condition(radio_slider_select, alt.value(0.9), alt.value(0.1)))
& street_block_location_spring.encode(opacity = alt.condition(radio_slider_select, alt.value(0.9), alt.value(0.1)))
).properties(title = title_location).add_selection(radio_slider_select)
street_block_plot_combined
Not all the work in this notebook is original. Parts that were borrowed from other resources are as follows: